Automatically Generated Consumer Health Metadata Using Semantic Spaces

نویسندگان

  • Ping Yu
  • James R. Warren
  • John Yearwood
  • Guocai Chen
  • Jim Warren
  • Joanne Evans
چکیده

The continual growth of the World Wide Web presents the (also growing) population of health information seekers with the challenge of finding reliable information that is appropriate to their needs. Metadata about consumer health websites can provide a guide for end users and domain-specific search tools. In this paper we present and demonstrate a method for automatically inferring a non-trivial metadata attribute that has been encoded for breast cancer websites: whether the site is ‘medical’ or ‘supportive’ in tone. We induce decision trees to distinguish Medical vs. Supportive sites based on feature vectors of word co-occurrence patterns, founded in a semantic space model called Hyperspace Analog to Language (HAL). We achieve 82% (95% CI: 74% to 91%) classification accuracy. This should already be a useful capability for human metadata coders or to support on-the-fly queries, and it inspires us to further investigate metadata classifiers based on HAL features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Value of Folksonomies for Creating Semantic Metadata

Finding good keywords to describe resources is an on-going problem. Typically, we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well-populated source of unstructured tags describing Web resources. This article explores the value of the folksonomy tags as a potential source of keyword meta...

متن کامل

Natural Language Processing and Web Mining Application of Social Analytics for Business Information Systems

Social networking tools, blogs and microblogs, user-generated content sites, discussion groups, problem reporting, and other social services have transformed the way people communicate and consume information. Yet managing this information is still a very onerous activity for both the consumer and the provider, the information itself remains passive. Traditional methods of keyword extraction fr...

متن کامل

Theme Creation for Digital Collections

This paper presents an approach for integrating multiple sources of semantics for the creating metadata. A new framework is proposed to define topics and themes with both manually and automatically generated terms. The automatically generated terms include: terms from a semantic analysis of the collections and terms from previous user’s queries. An interface is developed to facilitate the creat...

متن کامل

Automatic and Manual Annotation Using Flexible Schemas for Adaptation on the Semantic Desktop

Adaptive Hypermedia builds upon the annotation and adaptation of content. As manual annotation has proven to be the main bottleneck, all means for supporting it by reusing automatically generated metadata are helpful. In this paper we discuss two issues. The first is the integration of a generic AH authoring environment MOT into a semantic desktop environment. In this setup, the semantic deskto...

متن کامل

Efficient Crowdsourcing for Metadata Generation

Rich and correct metadata still plays a central role in accessing data sources in a semantic fashion. However, at the time of content creation it is often virtually impossible to foresee all possible uses of content and to provide all interesting index terms or categorizations. Therefore semantic retrieval techniques have to provide ways of allowing access to data via missing metadata, which is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007